369 research outputs found

    Classification algorithms for Big Data with applications in the urban security domain

    Get PDF
    A classification algorithm is a versatile tool, that can serve as a predictor for the future or as an analytical tool to understand the past. Several obstacles prevent classification from scaling to a large Volume, Velocity, Variety or Value. The aim of this thesis is to scale distributed classification algorithms beyond current limits, assess the state-of-practice of Big Data machine learning frameworks and validate the effectiveness of a data science process in improving urban safety. We found in massive datasets with a number of large-domain categorical features a difficult challenge for existing classification algorithms. We propose associative classification as a possible answer, and develop several novel techniques to distribute the training of an associative classifier among parallel workers and improve the final quality of the model. The experiments, run on a real large-scale dataset with more than 4 billion records, confirmed the quality of the approach. To assess the state-of-practice of Big Data machine learning frameworks and streamline the process of integration and fine-tuning of the building blocks, we developed a generic, self-tuning tool to extract knowledge from network traffic measurements. The result is a system that offers human-readable models of the data with minimal user intervention, validated by experiments on large collections of real-world passive network measurements. A good portion of this dissertation is dedicated to the study of a data science process to improve urban safety. First, we shed some light on the feasibility of a system to monitor social messages from a city for emergency relief. We then propose a methodology to mine temporal patterns in social issues, like crimes. Finally, we propose a system to integrate the findings of Data Science on the citizenry’s perception of safety and communicate its results to decision makers in a timely manner. We applied and tested the system in a real Smart City scenario, set in Turin, Italy

    Analyzing spatial data from twitter during a disaster

    Get PDF
    Social media can be an invaluable help in a mass emergency, but the information handling can be challenging. One major concern is identifying posts related to the area, or pinning them on a map. This exploratory study analyzes the spatial data coming with tweets during two natural disasters, an earthquake and a hurricane. Geo-tagged tweets confirm to be a small fraction of all tweets and disasters within a limited region appear to be a niche topic in the whole stream. The results can help researchers and practitioners in the design of tools to identify these messages

    BAC: A bagged associative classifier for big data frameworks

    Get PDF
    Big Data frameworks allow powerful distributed computations extending the results achievable on a single machine. In this work, we present a novel distributed associative classifier, named BAC, based on ensemble techniques. Ensembles are a popular approach that builds several models on different subsets of the original dataset, eventually voting to provide a unique classification outcome. Experiments on Apache Spark and preliminary results showed the capability of the proposed ensemble classifier to obtain a quality comparable with the single-machine version on popular real-world datasets, and overcome their scalability limits on large synthetic datasets

    Scaling associative classification for very large datasets

    Get PDF
    Supervised learning algorithms are nowadays successfully scaling up to datasets that are very large in volume, leveraging the potential of in-memory cluster-computing Big Data frameworks. Still, massive datasets with a number of large-domain categorical features are a difficult challenge for any classifier. Most off-the-shelf solutions cannot cope with this problem. In this work we introduce DAC, a Distributed Associative Classifier. DAC exploits ensemble learning to distribute the training of an associative classifier among parallel workers and improve the final quality of the model. Furthermore, it adopts several novel techniques to reach high scalability without sacrificing quality, among which a preventive pruning of classification rules in the extraction phase based on Gini impurity. We ran experiments on Apache Spark, on a real large-scale dataset with more than 4 billion records and 800 million distinct categories. The results showed that DAC improves on a state-of-the-art solution in both prediction quality and execution time. Since the generated model is human-readable, it can not only classify new records, but also allow understanding both the logic behind the prediction and the properties of the model, becoming a useful aid for decision makers

    Mathematical formulation to predict the harmonics of the superconducting Large Hadron Collider magnets : III. Precycle ramp rate effects and magnet characterization

    Get PDF
    The Large Hadron Collider (LHC) at CERN is equipped with a feed-forward control system known as the field description for the LHC (FiDeL) which is designed to predict the magnetic field and its multipoles, hence reducing the burden on beam based feedback. FiDeL consists of a physical and empirical parametric field model based on magnetic measurements at warm and in cryogenic conditions. It is particularly critical during beam injection when the field decays and at the beginning of acceleration when the field snaps back. It is known that the decay amplitude is largely affected by the powering history of the magnet, particularly by the precycle flattop current and duration and the preinjection preparation duration. Recently, we have collected data that quantify the dependence of the decay amplitude on the precycle ramp rate. This paper presents the results of the measurements performed to investigate this effect, and the method included in FiDeL to model the precycle dependence.With this complete picture of dynamic changes, we finally discuss the effect on the data taken at nominally constant field, along the magnet loadline. We show that a correction for dynamic changes is required for adequate magnet characterization.peer-reviewe

    Experimental Detection of Nonlinear Dynamics Using a Laser Profilometer

    Get PDF
    This paper investigates a cantilever beam nonlinear dynamic behaviour, on which the nonlinearity is introduced with permanent magnet interactions or with a non-holonomic contact. The experimental time domain responses obtained from non-zero initial conditions are measured using a laser profilometer, conventionally adopted for product shape detections in online industrial applications. The Fourier transform, Continuous Wavelet transform, and Hilbert transform are used to investigate nonlinear phenomena in the frequency content, highlighting advantages and drawbacks of the three methods in catching instantaneous phenomena. Then, a Multi-Phi approach is proposed to describe the time evolution of nonlinear systems by means of a discrete number of linearised systems. Therefore, two linearised models have been developed and tuned to describe the dynamic behaviour of different Euler–Bernoulli cantilever beam configurations. The experimental data of nonlinear systems are compared with the corresponding ones of the linear system to evaluate the effects of introduced nonlinearities on the overall dynamic properties

    Iceland, an Open-Air Museum for Geoheritage and Earth Science Communication Purposes

    Get PDF
    Iceland is one of the most recognizable and iconic places on Earth, o\ufb00ering an unparalleled chance to admire the most powerful natural phenomena related to the combination of geodynamic, tectonic and magmatic forces, such as active rifting, volcanic eruptions and subvolcanic intrusions. We have identi\ufb01ed and selected 25 geosites from the Sn\ue6fellsnes Peninsula and the Northern Volcanic Zone, areas where most of the above phenomena can be admired as they unfold before the viewers\u2019 eyes. We have qualitatively assessed the selected volcano\u2013tectonic geosites by applying a set of criteria derived from previous studies and illustrated them through \ufb01eld photographs, unmanned aerial vehicle (UAV)-captured images and 3-D models. Finally, we have discussed and compared the di\ufb00erent options and advantages provided by such visualization techniques and proposed a novel, cutting-edge approach to geoheritage promotion and popularization, based on interactive, navigable Virtual Outcrops made available online

    SeLINA: a Self-Learning Insightful Network Analyzer

    Get PDF
    Understanding the behavior of a network from a large scale traffic dataset is a challenging problem. Big data frameworks offer scalable algorithms to extract information from raw data, but often require a sophisticated fine-tuning and a detailed knowledge of machine learning algorithms. To streamline this process, we propose SeLINA (Self-Learning Insightful Network Analyzer), a generic, self-tuning, simple tool to extract knowledge from network traffic measurements. SeLINA includes different data analytics techniques providing self-learning capabilities to state-of-the-art scalable approaches, jointly with parameter auto-selection to off-load the network expert from parameter tuning. We combine both unsupervised and supervised approaches to mine data with a scalable approach. SeLINA embeds mechanisms to check if the new data fits the model, to detect possible changes in the traffic, and to, possibly automatically, trigger model rebuilding. The result is a system that offers human-readable models of the data with minimal user intervention, supporting domain experts in extracting actionable knowledge and highlighting possibly meaningful interpretations. SeLINA's current implementation runs on Apache Spark. We tested it on large collections of realworld passive network measurements from a nationwide ISP, investigating YouTube and P2P traffic. The experimental results confirmed the ability of SeLINA to provide insights and detect changes in the data that suggest further analyse

    SeLINA: a Self-Learning Insightful Network Analyzer

    Get PDF
    Understanding the behavior of a network from a large scale traffic dataset is a challenging problem. Big data frameworks offer scalable algorithms to extract information from raw data, but often require a sophisticated fine-tuning and a detailed knowledge of machine learning algorithms. To streamline this process, we propose SeLINA (Self-Learning Insightful Network Analyzer), a generic, self-tuning, simple tool to extract knowledge from network traffic measurements. SeLINA includes different data analytics techniques providing self-learning capabilities to state-of-the-art scalable approaches, jointly with parameter auto-selection to off-load the network expert from parameter tuning. We combine both unsupervised and supervised approaches to mine data with a scalable approach. SeLINA embeds mechanisms to check if the new data fits the model, to detect possible changes in the traffic, and to, possibly automatically, trigger model rebuilding. The result is a system that offers human-readable models of the data with minimal user intervention, supporting domain experts in extracting actionable knowledge and highlighting possibly meaningful interpretations. SeLINA’s current implementation runs on Apache Spark. We tested it on large collections of realworld passive network measurements from a nationwide ISP, investigating YouTube and P2P traffic. The experimental results confirmed the ability of SeLINA to provide insights and detect changes in the data that suggest further analyses

    Wooden music instrument vibro-acoustic fingerprint: the case of a contemporary violin

    Get PDF
    Violins are complex wooden musical instruments, whose quality is mainly evaluated on the basis of their aesthetics, as well as depending on the historical relevance of their makers. However their acoustic quality remains a key evaluation parameter for performers and listeners. The instrument perceived quality, in turn, depends, on one side, on the player, the environmental conditions and on the listeners’ psychoacoustic factors. On the other side, the quality of a violin depends on its materials, constructive and setup parameters, that impact on the vibro-acoustical characteristics of the instrument. This work investigates a procedure for the vibro-acoustic characterization of a violin, here called vibro-acoustic fingerprint, as an example of vibro-acoustical characterization of a wooden music instrument. The procedure was applied, as a case study, to an Italian contemporary violin, built in the year 2011 by the violin-maker Enzo Cena on a Guarneri del Gesù model
    • …
    corecore